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Abstract: The recognition of methyl-lysine and -arginine residues on both histone and other proteins by specific "reader" 
elements is important for chromatin regulation, gene expression, and control of cell-cycle progression. Recently the cru- 
cial role of these reader proteins in cancer development and dedifferentiation has emerged, owing to the increased interest 
among the scientific community. The methyl-lysine and -arginine readers are a large and very diverse set of effector pro- 
teins and targeting them with small molecule probes in drug discovery will inevitably require a detailed understanding of 
their structural biology and mechanism of binding. In the following review, the critical elements of methyl-lysine and 
-arginine recognition will be summarized with respect to each protein family and initial results in assay development, 
probe design, and drug discovery will be highlighted. 
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INTRODUCTION AND PROTEIN FAMILIES 

Methylation of lysine and arginine residues on histones 
and other proteins plays a crucial role in the activation and 
repression of gene expression, and consequently, the study of 
epigenetic regulation has recently emerged as both a major 
challenge and an opportunity in biomedical sciences [1, 2]. 
Methylation is often considered to be one letter in a diverse 
alphabet of histone post-translational modifications (PTMs), 
including phosphorylation, ubiquination, acetylation, and 
glycosylation, among others [3, 4]. PTMs play a critical role 
in the regulation of signaling pathv^ays where they can serve 
as chemical switches to induce or repress protein-protein 
interactions [5]. In particular, methylation marks have been 
shown to serve as initiators for the recruitment of non- 
histone proteins which dictate higher-order chromatin struc- 
ture and function, such as gene expression and repression [3, 
6]. 

Understanding the role of histone PTMs is complicated 
by the fact that the same chemical modification located at 
different positions within a single protein can have different 
effects on gene expression. For example, within the histone 3 
tail, methylation of lysine 4 (H3K4), H3K36, or H3K79 re- 
sults in activation of transcription, whereas methylation of 
H3K9 or H3K27 results in transcriptional repression. This 
situation is further complicated by the fact that both arginine 
and lysine can be variably methylated, causing the overall 
transcriptional outcome to be dictated by both the position of 
the residue and the degree of methylation [7]. Histone meth- 
yltransferases can add up to three methyl groups to a single 
lysine side chain, while arginine can exist as either 
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monomethyl or dimethyl arginine, the latter in either a sym- 
metric (sRme2) or asymmetric (aRme2) fashion (Fig. 1). 

Although the mechanisms by which cells decipher a 
PTM-mediated histone code are far from completely under- 
stood, it is becoming clear that histone PTMs are read by 
effector proteins which facilitate downstream events via the 
recruitment and/or stabilization of chromatin-templated ma- 
chinery [8, 9]. Exploration of these epigenetic readers there- 
fore plays a key role in furthering our understanding of how 
histone PTMs regulate complex biological functions [10, 
11]. The identification of binding modules for methylated 
lysines has been largely successful, whereas protein recep- 
tors for methylated arginines in histone proteins have re- 
ceived less attention to date [12]. While methyl-lysine and 
-arginine readers are most commonly associated with the 
recognition of histone modifications, they are also known to 
interact with methylation marks on non-histone proteins, as 
will be discussed in further detail. The identification of novel 
reader proteins remains a challenge, as does the broader goal 
of understanding the relationship between PTM binding pro- 
teins and human disease. 

Current estimates of the number of methyl-lysine binding 
proteins in the human proteome exceeds 170 [13] and this 
number continues to grow with ongoing research. Despite 
various structural and functional differences, methyl-lysine 
and -arginine readers share many common features which 
facilitate their recognition of these PTMs. All methylated 
forms of lysine are cationic at physiological pH, while 
trimethyllysine contains a fixed positive charge irrespective 
of its environment. As the size, hydrophobicity, distribution 
of positive charge, and ability to serve as a hydrogen bond 
donor differs between methylation states, each PTM interacts 
with a protein reader that can adapt to these specific inherent 
physical properties. A subtle change in methylation state can 
impact the resulting protein-protein interaction with pro- 
found consequences for gene regulation and expression. A 
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Fig. (1). Different methylation states of lysine (A) and arginine (B). 

recent publication analyses the effect of the methylation state 
on one of the effector proteins (L3MBTL1) by means of mo- 
lecular dynamics and free energy perturbation techniques 
combined with biophysical binding data in the context of a 
small- molecule model system [14]. Gaining a greater under- 
standing of the atomic- level mechanisms by which me thy 1- 
lysine recognition occurs will be helpful in understanding far 
more complex phenomena including how the effector pro- 
teins control many biological processes. 

The conserved recognition of methyl- lysine marks is 
largely mediated by the interaction between the methylam- 
monium group and aromatic residues in the protein receptor, 
which form an aromatic "cage" around the PTM. Such aro- 
matic cages tend to be relatively specific for a certain methy- 
lation state, discriminating between PTMs based on differ- 
ences in size and shape. The binding interaction between the 
methylammonium and the aromatic cage is largely the result 
of cation-Ti interactions, while hydrophobic desolvation ef- 
fects also have a substantial role. The cation-Ti interaction is 
generally thought of as a charge-quadrupole interaction be- 
tween a positively charged species and an aromatic ring, 
primarily electrostatic in nature [15, 16]. The importance of 
cation-Ti interactions in the context of proteins has been de- 
scribed previously by Burley and Petsko in 1986 [17], and 
this recognition motif has been seen to be highly conserved 
in many protein-protein interactions. In the recognition of 
the lower methylation states, hydrogen bonding and steric 
exclusion also become increasingly important. Depending on 
the methylation state, nearby acidic residues in the protein 
are also known to form salt bridges with the methylated ly- 
sine residue, offering an additional stabilizing effect [18]. 
Based on current understanding, the lower methylation states 
of lysine (Kmel&2) can be found to bind via 8l cavity- 
insertion recognition mode whereby the methylammonium 
group is deeply buried within the protein while neighboring 
residues in the histone peptide are making few interactions, 
causing little sequence selectivity to be observed in vitro. In 
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contrast, trimethyl- lysine (Kme3) is predominantly recog- 
nized via surface-groove recognition whereby the peptide 
lies along the protein surface enabling surrounding residues 
and the peptide backbone to form additional interactions 
with the effector protein, leading to a more sequence selec- 
tive binding event [12]. Understanding of the different 
modes of recognition has been and will continue to be sig- 
nificantly advanced with the availability of crystal structures 
of these domains. 

While much attention in this field has been focused on 
the readers of methylated lysines, arginine methylation has 
similarly been identified as a key player in the regulation of 
cellular processes [19-21]. Although our present knowledge 
of methyl- arginine effector proteins is limited, there is evi- 
dence that methyl- arginine serves as a mediator of protein- 
protein interactions [22]. Similarly to lysine, there is prece- 
dence for the stacking of arginine with aromatic residues via 
cation-Ti interactions [23-25]. Arginine' s ability to bind to 
aromatic residues is due to its ability to interact via a combi- 
nation of cation-TT and tt-tt stacking interactions, and methy- 
lation of arginine is thus expected to magnify this interac- 
tion. 

Despite a conserved cation-Ti recognition motif across 
most methyl-lysine and -arginine binding proteins, the pro- 
tein readers also belong to distinct families based on other 
specific characteristics. Consideration of the differences be- 
tween reader families will aid in the development of selec- 
tive probes and drugs targeting specific proteins. The recog- 
nition of methyl-lysine and -arginine residues generally oc- 
curs via interaction of a specific sub-domain within the pro- 
tein. The proteins which are known to contain methyl-lysine 
binding domains are the plant homeodomain (most com- 
monly referred to as PHD fingers), the Royal Family pro- 
teins consisting of Tudor, Agenet, Chromo, PWWP (Proline- 
Tryptophan-Tryptophan-Proline domains) and MBT (Malig- 
nant Brain Tumor) domains, and finally the WD40 repeat 
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protein [26] . A recent review of the structural biology of the 
various PTM reader proteins gives an excellent overview of 
the structural features and subdomains of these proteins [27]. 

Current drug discovery efforts are largely directed toward 
the enzymatic proteins which install and remove lysine 
methylation marks (PTM writers and erasers, respectively) 
[28]. In constrast, binding proteins for PTM histone marks 
have only just emerged as valid targets of probe research and 
drug discovery. For example, Filippakopoulos and co- 
workers have recently shown that bromodomain, the readers 
of acetyl-lysine, can be targeted by small molecule probes 
[29, 30]. The reason methyl- lysine readers have received less 
attention to date is two-fold. First, the readers of methyl- 
lysine marks are usually characterized by a low affinity for 
their respective peptide substrates, making any high- 
throughput assay difficult. Second, their lack of enzymatic 
activity has hindered their validation as critical drug targets. 
Small molecule antagonists of reader domains offer the po- 
tential of different in vivo efficacy and toxicity as compared 
to inhibitors of the writer and eraser enzymes [31] and also 
can contribute greatly to a growing understanding of how 
small molecules can target protein-protein interactions. This 
review will focus primarily on the families of methyl- lysine 
readers and their potential in the field of drug discovery, 
while the homologous readers of methyl- arginine will be 
discussed within the related family of lysine reader. 

MBT FAMILY 

The malignant brain tumor (MBT) domain is known to 
recognize the lower methylation states of lysine (Kmel and 
Kme2) on both histones and other regulatory proteins. The 
MBT family is comprised of 9 human members, each of 
which contains MBT subdomains occurring in tandem re- 
peats of 2-4 units (Fig. 2). The MBT domains are often 
flanked by other subdomains, which in some cases are sug- 
gested to assist in dimerization (SPM domain) or support 
binding to DNA (Zn-finger domain). 

The MBT domain itself consists of ca. 100 amino acids, 
and highly conserved homo logs can be found in humans, 
Drosophila, and C. elegans. A recent review of the MBT 
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family highlights its structural features and the cellular func- 
tions of each of its members [32]. The interaction between 
the MBT readers and methylated lysine residues is best de- 
scribed as a cavity-insertion mode [12] of recognition and 
the available co-crystal structures for five different domains 
indicate few interactions beyond the methylated lysine resi- 
due. A consequence of this localized interaction is that MBT 
domains are almost entirely non- sequence selective as long 
as the methylated peptide shows a high isoelectric point [33]. 
Most of the MBT proteins can be evolutionary linked to one 
of three Drosophila orthologs: dL3MBTL, dSCM (Sex 
Comb on Mid leg. Fig. (2B) shows one human ortholog: 
SCML2), or the four MBT repeat containing dSFMBT 
(SCM-related gene containing Four MBT domains) [34]. 

MBT domains are essential for transcriptional regulation 
and share an overall conserved binding mechanism. For ex- 
ample, L3MBTL1 has a slight preference for monomethyl- 
lysine over the dimethyl analog. Its aromatic binding pocket 
is made up of a tyrosine, phenylalanine, and tryptophan resi- 
due with an essential hydrogen bond between the lysine 
ammonium group and an aspartic acid (D355, Fig. 3A). In 
contrast, SCML2 and SCMHl are the only MBT domains in 
which the MBT repeat occurs in tandem and exhibits a 
stronger preference for monomethyl- lysine [35-37]. In these 
two reader proteins the binding cavity contains a phenyla- 
lanine, replacing the tyrosine in L3MBTL1 (Fig. 3B). 

Many members of the MBT family and their related bio- 
logical functions have been studied in detail, however this 
review will focus on L3MBTL1 and L3MBTL3. The struc- 
tural biology of L3MBTL1 has been well covered in recent 
literature [38], providing a valuable starting point for ligand- 
based drug design. Furthermore, L3MBTL1 has been shown 
to act as a chromatin lock [39] and to be important for tran- 
scriptional repression [33]. This methyl- lysine reader re- 
presses the expression of E2F regulated genes, such as the 
oncogenic and growth related c-myc gene. As various mem- 
bers of the MBT family are not sequence selective, 
L3MBTL1 has also been shown to bind the tumor suppres- 
sor, p53, through recognition of the methylated lysine resi- 
due 382 [40]. It has also been suggested that L3MBTL1 




Fig. (2). Example of tandem repeat of MBT domains: A) L3MBTL1 bound to H4K20me2 (PDB: 2RJF) and B) SCML2 bound to a mono- 
methylated lysine (PDB: 2VYT). Protonated methyl-ly sines are displayed in ball and stick model with gray carbon atoms, key binding site 
residues are displayed in stick model with white carbon atoms. 
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Fig. (3). A) Binding cavity of L3MBTL1 bound to H4K20me2 (PDB:2RJF) and B) Binding cavity of SCML2 bound to a mono-methylated 
lysine (PDB: 2VYT). Protonated methyl-lysines are displayed in ball and stick model with gray carbon atoms, key binding site residues are 
displayed in stick model with white carbon atoms. 



might bind to the retinoblastoma (Rb) protein in an analo- 
gous fashion, and consequently facilitates repression of c- 
myc. In addition, in a recent report the Nimer lab has shown 
that L3MBTL1 positivily influences genomic stability [41], 
while Perna and co-workers have demonstrated the effect of 
L3MBTL1 on erythroid differentiation [42]. 

Small molecule ligand discovery for methyl- lysine bind- 
ing domains is relatively unexplored territory. The first re- 
ported virtual screening (VS) study on MBT proteins was 
carried out by Kireev and co-workers [43]. The iResearch 
Library (ChemNavigator) which contained more than 



50 million procurable compounds was virtually screened for 
MBT ligands. In this study, two complementary VS ap- 
proaches were utilized: a substructure search for compounds 
containing Kmel and Kme2-'like' side chains and a pharma- 
cophore screen followed by docking to search for more 
structurally remote compounds mimicking the histone pep- 
tide interaction. A total of 51 compounds were subsequently 
purchased and tested against a panel of four MBT-containing 
proteins (L3MBTL1, L3MBTL3, L3MBTL4 and MBTDl) 
using an in vitro chemiluminescent assay (AlphaScreen) 
[44]. Nineteen compounds showed specific dose-dependent 



Table 1. ICso-Values for Select Small Molecule Ligands of Four MBT Domains as Identified by a Chemiluminescent Assay 
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protein binding activity and provided inital structure-activity 
information for lead generation (A selection of these small 
molecules is shown in Table 1). 

More importantly, the first co- crystal structure of 
L3MBTL1 with a small molecule ligand has recently been 
pubhshed, (Fig. 4, PDB: 3P8H) along with a detailed study 
of the binding event using medicinal chemistry, mutagenesis 
and multiple orthogonal assay formats [45]. The nicotina- 
mide ligand is shown to bind in the second domain of 
L3MBTL1 analogously to the native peptides. The signifi- 
cantly larger and more rigid amine anchor makes a critical 
interaction with the acidic residue D355 and fills out the 
binding pocket almost entirely (See Fig. 3A). These studies 
represent the first steps towards high-quality chemical 
probes for methyl-lysine binding domains though further 
improvements in potency, greater selectivity profiling and 
cell-based evidence of activity and mechanism are yet to be 
addressed. 



O 




Fig. (4). Nicotinamide ligand (4) shown in the co-crystal structure 
with L3MBTL1. PDB: 3P8H. 

The involvement of MBT domains in differentiation has 
been seen in similar fashion for L3MBTL3, a close homolog 
of L3MBTL1. Northcott and co-workers reported the poten- 
tial involvement of L3MBTL3 in the occurrence of malig- 
nant pediatric brain tumors [46]. High resolution single nu- 
cleotide polymorphism (SNP) genotyping identified the am- 
plification and deletion of tumor suppressor genes, including 
homo- and heterozygous deletions of modules of histone 
lysine methylation including the readers L3MBTL3, 
L3MBTL2, and SCML2 and the writers GLP (also known as 
EHMTl) andSMYD4. 

Based on recent literature highlighting the significance of 
L3MBTL1 and L3MBTL3 in oncogenesis, being able to 
target these proteins directly would serve as an ideal tool for 
cancer biology, especially as it is unknown if these deregu- 
lated readers are actively promoting oncogenesis or mere 
bystanders [47]. Small molecule antagonists of MBT domain 
interactions could be instrumental to determine, for instance, 
the role of L3MBTL3 in the development of medulloblas- 
tomas, which are still the leading cause of cancer-related 
deaths in children. 

PHD FINGERS 

The plant homeodomain, also referred to as PHD finger, 
is a structural motif consisting of approximately 50 to 
80 amino acids. The PHD finger family is one of the largest 
and most diverse of the methyl-lysine readers, and within 
this family two general classes have been identified based on 
their preferred binding partner: the first group binds di- and 
trimethylated lysine residues whereas the second group in- 
teracts with unmethylated lysine residues. PHD fingers 
which bind unmodified lysine residues apparently organize 
their respective binding pockets only in the presence of their 
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binding partner, while the binders of di- and trimethyl- lysine 
are known to have a preorganized binding pocket, similar to 
the previously mentioned MBT domains [48]. Some of the 
PHD fingers are good examples of surface-groove recogni- 
tion readers which was discussed above. The extended inter- 
action of the methylated lysine and neighboring residues 
leads to selectivity for a specific PTM in those cases. For 
drug design these additional interactions could also prove 
crucial in addressing the questions of selectivity and po- 
tency. It is also worth noting that for any probe design or 
drug discovery efforts targeting the readers of trimethyl- 
lysine such as the PHD fingers or other domains discussed 
below, an uncharged substitute for the trimethylammonium 
group is required to achieve cell-permeability. 

Of the 81 known members of the family, a significant 
number have structural data available in the form of crystal 
structures (13) and NMR structures (14), both of which will 
be of great value in future efforts focused on the design of 
chemical probes. The ability of PHD fingers to bind both 
unmethylated and methylated lysine residues demonstrates 
the diversity among this family, and it is worthwhile to high- 
light distinct members of the PHD family to illustrate their 
structural biology (Fig. 5) and function. 

PHF21A (BHC80) specifically interacts with unmethy- 
lated H3K4 (H3K4meO) (Fig. 5A) [49]. Unlike most other 
reader proteins where the binding pockets feature an aro- 
matic cage, there are no aromatic residues inside the binding 
pocket of PHF21A (Fig. 5A). The crystal structure revealed 
that the specificity for H3K4meO is achieved through the 
hydrogen bond formed between the lysine 8-amine and D489 
of the binding pocket, and further stabilized by an additional 
hydrogen bond between the amine and the E488 backbone. 
The structure also suggests that steric exclusion of the 
methyl group prevents methylated H3K4 from binding to 
this pocket. Knockdown of PHF21A by RNA inhibition re- 
sults in the de-repression of LSDl target genes, and this re- 
pression is restored by the reintroduction of wild-type 
PHF21A but not by the D489A mutant, which does not bind 
H3K4. These findings highlight the importance of PHF21A 
in gene repression, and more interestingly, suggests unmeth- 
ylated lysine residues are subject to specific reader interac- 
tions in the absence of PTM. 

The importance of PHD fingers in human disease was 
recently outlined in a review by Baker and co-workers [50]. 
In many cases, PHD fingers are linked to disease as a conse- 
quence of mutants or translocations of the native proteins. In 
such cases, drug discovery efforts focused towards such mu- 
tant proteins rather than the wildtype proteins would likely 
be most effective. However, a few examples are known 
where PHD fingers are associated with human disease and 
could be targeted directly. For example, Wang and co- 
workers have reported that a fusion protein containing a 
PHD finger (PHF23) and Nucleoporin (NUP98) showed all 
the characteristics of a potent oncoprotein whose function 
was dependent on binding to H3K4me3 [51]. The pheno- 
typic consequences of this mutation included arrested hae- 
matopoietic differentiation and acute myeloid leukemia. 
PHF23 has very high sequence homology with PHF13 (Fig. 
5B), for which a crystal structure has been solved. 

Besides the potential to combat cancer in innovative 
ways, the readers of methyl-lysine residues have also been 
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Fig. (5). Binding pockets of four representative PHD fingers: A) PHF21A (BHC80) bound to unmethylated H3K4 (PDB: 2PUY). B) PHF13 
bound to H3K4me3 (PDB: 307A). C) PYGOl bound to H3K4me2 (PDB: 2VPE). D) BPTF bound to H3K4me3 (PDB: 2F6J). In all four 
structures, PHD domains are shown in yellow ribbon and histone tails are shown in red ribbon. Protonated methyl-lysines (unmethylated 
H3K4 in 5A) and trimethyl-lysines are displayed in ball and stick model with gray carbon atoms. Key binding site residues of the PHD fin- 
gers are displayed in stick model with white carbon atoms. 



shown to play a role in differentiation and stem cell self- 
renewal. Targeting these proteins may therefore open new 
avenues for the generation and maintenance of stem cells and 
advance the field of regenerative medicine. Walker and co- 
workers have recently reported that the interaction between 
polycomb-like protein 2 (PCL2) and the polycomb repres- 
sive complex 2 (PRC2) is important for de-differentiation 
and self-renewal [52]. Interestingly enough, PCL2 contains a 
Tudor domain (discussed below) and two PHD fingers which 
are necessary to bind H3K27me3 and in turn direct PRC2 
function to specific targets. 

TUDOR DOMAINS 

Tudor domains belong to the Royal superfamily of 
methyl-lysine effector proteins, and consist of structurally 
diverse proteins which display a range of recognition motifs, 
interacting with both higher and lower lysine methylation 



states. There are currently close to 40 known Tudor contain- 
ing proteins, with 12 available crystal structures [53]. Impor- 
tantly, it has been reported that the Tudor family of proteins 
are closely linked to gametogenesis, in both Drosophila and 
mice, while also playing roles in various piRNA pathways 
[54] . As there is not a complete understanding of the role of 
Tudor domains as methyl-lysine and -arginine readers, the 
development of small molecule modulators of these protein- 
protein interactions would serve as a valuable tool in devel- 
oping a more thorough understanding of the role of Tudor 
domains in both disease and development. 

Tudor domains are characterized by a bent antiparallel (3- 
barrel, with conserved residues stabilizing the structure 
through formation of a hydrophobic core and an overall 
negatively charged surface [55]. Tudor proteins which rec- 
ognize methyl-lysine are generally classified as tandem or 
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double tudor domains, while a single Tudor domain methyl- 
lysine reader has yet to be identified. JMJD2A is a well 
known double Tudor domain which functions as both a de- 
methylase and methyl- lysine binding protein (Fig. 6A). The 
reader contains an interdigitated folding of two individual 
Tudor domains, linked by two shared P-strands. It is this 
interleaved, bilobal topology of the two domains which is 
required to form an appropriate binding pocket to interact 
with methylated histone H3K4 and H4K20 [56]. Two aro- 
matic residues from the second tudor sequence and one from 
the first tudor sequence generate an aromatic cage, which in 
conjunction with a nearby aspartate residue form a binding 
pocket for trimethyllysine. In contrast to the double Tudor 
domains of JMJD2A, mammalian p5 3 -binding protein 
(53BP1) has been shown to recognize the lower lysine meth- 
ylation states of both p53 and H4K20 [57, 58]. The ability of 
53BP1 to bind both H4K20me2 and p53K382me2 is impor- 
tant as it facilitates p53 recruitment to DNA damage sites, 
where the H4K20me2 modification is prevalent, thereby 
promoting DNA double-strand break repair [59]. The struc- 
ture of 53BP1 differs appreciably from that of JMJD2A de- 
spite significant sequence homology; 53BP1 consists of two 
independently folded tudor domains, of which the first is 
primarily involved in methyl lysine recognition assisted by 
the second (Fig. 6B). The specificity of 53 BP 1 for Kmel and 
Kme2 is the consequence of both an intermolecular hydro- 
gen bond between the mono- or dimethylammonium group 
and an aspartic acid residue located in the central binding 
pocket and the fact that it is apparently not large enough to 
accommodate a trimethyl- lysine mark [60]. In addition, 
53BP1 is shown to recognize a neighboring basic residue 
such as an arginine (H4R19) (Fig. 6B) or a lysine (p53K381) 
in a second aromatic pocket. 

Although the characterization of Tudor domains as 
methyl- arginine readers is less well documented, there is 
substantial evidence that the SMN (Survival of Motor Neu- 
ron) Tudor domain, which is linked to spinal muscular atro- 
phy, recognizes the arginine-glycine rich C-terminal tails of 
spliceosomal Sm proteins and that this binding event is me- 



diated by symmetrical dimethylation of arginine side chains 
[61, 62]. It is proposed that this interaction is facilitated by 
the positioning of a symmetrical dimethylarginine side chain 
near a cluster of conserved aromatic residues, forming a 
typical cage-like mode of recognition. In another case, it has 
been demonstrated that members of the Tudor family associ- 
ate with PIWI (P-element-induced wimpy testis) specifically 
through sRme2 [54, 63]. More recently, it was shown that 
Tudor protein human SNDl (staphylococcal nuclease do- 
main-containing 1) binds PIWILl in an arginine methylation 
dependent manner, suggesting a previously undescribed 
function for SNDl in regulating piRNA pathways [63]. 
Crystal structures revealed that the intact SNDl extended 
Tudor domain forms a wide and negatively charged binding 
groove which can appropriately accommodate sRme2 pep- 
tides from PIWILl in different orientations [63]. 

CHROMODOMAINS 

The chromodomain is a highly conserved family of 
methyl-lysine reader proteins found in both plants and ani- 
mals, consisting of 40-50 amino acids and spanning 34 
known members [53]. One of the earliest and best- known 
protein-protein interactions induced by methyl-lysine recog- 
nition is the binding of H3K9me3 to the HPl (heterochro- 
matin- associated protein 1) chromodomain, which in turn 
results in gene silencing. HPl and other members of the 
chromodomain family are generally known to consist of an 
N-terminal three- stranded anti-parallel P-sheet which folds 
against a C-terminal a-helix [64]. For HPl, the binding af- 
finities have been reported to be in the low micromolar range 
for both H3K9me3 and H3K9me2 [65, 66]. The aromatic 
binding cage is made up of three residues which form a con- 
served aromatic pocket into which the methylammonium 
group inserts itself (see Fig. 7A for human homolog of HPl). 
Mutation of any of these aromatic residues drastically re- 
duces affinity for the methylated histone tail. Furthermore, 
residues 5-10 of the histone tail (QTARK9S) interact with the 
chromodomain by an induced-fit sandwiching between ter- 
minal P-strands, completing a five-stranded antiparallel P- 
sheet [65]. Mutation studies of residues in both the peptide 




Fig. (6). A) Double tudor domain JMJD2A with the aromatic cavity bound to H3K9me2 (PDB: 2OX0) B) Binding pocket of 53BP1 tandem 
tudor domain bound to H4K20me2 (PDB: 2IG0). In both structures, the two tudor domains are shown in blue and green, respectively. The 
trimethyl-lysine in A and the protonated dimethyl-lysine and the neighboring H4R19 residue in B are displayed in ball and stick model with 
gray carbon atoms. Key binding site residues are displayed in stick model with white carbon atoms. 



58 Current Chemical Genomics, 2011, Volume 5 Herold et al. 




Fig. (7). A) Aromatic binding pocket of CBX5 (human HP la) bound to H3K9me3 (PDB: 3FDT) B) Chromodomain CHDl binding pocket 
with H3K4me3 (PDB: 2B2W). In both structures, the trimethylated lysines are displayed in ball and stick model with gray carbon atoms, key 
binding site residues are displayed in stick model with white carbon atoms. 



and protein have confirmed the contribution of intermolecu- 
lar contacts along the extended surface groove to both bind- 
ing affinity and selectivity [67]. 

In comparison to HPl, methyl- lysine recognition by 
chromobox (CBX) proteins (for example, Polycomb) involve 
fewer contacts with the residues surrounding the methyl- 
lysine [68]. While this is often associated with a decrease in 
sequence selectivity, it can be envisioned that such non- 
sequence selective recognition domains are more amenable 
to inhibition by small molecules due to their limited binding 
sites. Furthermore, tandem chromodomains have been re- 
ported such as CHD (chromo helicase DNA-binding) pro- 
teins in which the two chromodomains are bridged by a two- 
helix linker to form a continuous surface. The human CHDl 
double chromodomain, for example, interacts with H3K4 
methylation sites, a hallmark of active chromatin. The two 
CHDl chromodomains are seen to cooperatively interact 
with one methylated H3 tail at the chromodomain junction, 
using only two aromatic residues for methyl-lysine recogni- 
tion in contrast to the 3 -residue aromatic cage of HPl (Fig. 
7B) [69]. 

While ongoing research is helping to elucidate the bind- 
ing sites and interacting proteins of various chromodomain- 
containing proteins, it is also becoming increasingly clear 
that mutations in such domains are closely linked to a variety 
of disorders. In 2004, Vissers and co-workers identified that 
heterozygous mutations in Chromodomain Helicase DNA- 
binding protein 7 (CHD7) cause the CHARGE syndrome 
[70, 71] (Coloboma of the eye. Heart defects. Atresia of the 
choanae, severe Retardation of growth and development. 
Genital abnormalities, and Ear abnormalities), a disease with 
an estimated incidence of approximately 1 in 10,000 new- 
borns. In a recent study combining Chromatin Immunopre- 
cipiation with microarray technology (ChlP-chip), the role of 
CHD7 in gene expression was identified. CHD7 was shown 
to bind to mono- and dimethylated histone H3K4 in enhan- 
cer regions of numerous genes in a highly cell type- 



dependent manner. It was observed that CHD7 localization 
sites change concomitantly with H3K4me patterns during 
cell differentiation, indicating that the H3K4 methylation 
mark defines lineage- specific association of CHD7 with spe- 
cific sites on chromatin [72]. A selective small molecule 
ligand of CHD7 could therefore serve as a tool to elucidate 
the biology behind CHARGE syndrome. 

PWWP DOMAINS 

The proline-tryptophan-tryptophan-proline (PWWP) mo- 
tif is a structural domain which recognizes various methyla- 
tion states of lysine residues and can be found in both pro- 
teins as well as enzymes involved in chromatin biology. The 
PWWP domain of Brpf 1 was recently characterized by both 
NMR and crystallization studies, elucidating both the apo 
form of the methyl-lysine reader as well as the protein in 
complex with the H3K36me3 peptide (Fig. 8) [73]. 

The PWWP domain is a member of the Royal family and 
is instrumental for the assembly of a complex involved in 
acute myeloid and mixed-lineage leukemia (ALL and MLL, 
respectively) after chromosomal translocations [74]. As 
mentioned before with regards to the PHD family, targeting 
such oncogenic fusion proteins could prove significant for 
drug discovery (Fig. 8) [29]. 

WD40 REPEAT 

The WD40 repeat, also known as the beta-transducin 
repeat, is a protein domain found in numerous readers and 
enzymes. The first WD40 repeat reported to bind histone 
modifications is the WDR5 domain found in the MLL/SETl 
methyltransferase complex, which binds di- or trimethylation 
states of H3K4 [75]. Recently, a second member of the 
WD40 repeat, EED, was reported to associate with and regu- 
late the activity and specificity of the polycomb repressive 
complex (PRC2) [76]. Xu and co-workers demonstrate that 
the reader protein directly modulates the methyltransferase 
activity through its binding of various methylated peptides. 
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Fig. (8). PWWD domain Brpfl Binding Pocket with H3K36me3 
(PDB: 2X4 Y). The trimethylated lysine is displayed in ball and 
stick model with gray carbon atoms, key binding site residues are 
displayed in stick model with white carbon atoms. 



By using different trimethylated peptides the authors show 
that H3K27me3 stimulates the transferase activity relative to 
other trimethylated peptides. Interestingly, EED itself can 
become methylated by PRC2 when HlK26Me3 is the ligand 
(Fig. 9). 




Fig. (9). EED as an example of a WD40 domain binding to 
H3K27me3 (PDB: 3JZG). The trimethylated lysine is displayed in 
ball and stick model with gray carbon atoms, key binding site resi- 
dues are displayed in stick model with white carbon atoms. 

DRUG DISCOVERY ENABLED BY CHEMICAL 
PROBES 

The readers of PTMs associated with chromatin regula- 
tion currently represent unexplored territory for drug discov- 



ery. In order for these protein classes to become part of the 
druggable genome [77] both their tractability and validity 
need to be experimentally verified. Recent reports of the 
discovery of potent and selective, cellularly active, small 
molecule antagonists of acetyl-lysine recognition [29, 30] 
targeting the bromodomain BET subfamily, provide signifi- 
cant encouragement that both these issues can be addressed 
for at least some of the readers of the histone code. Our 
strategy is to take a protein- family approach [77] toward the 
therapeutically unbiased discovery of high quality chemical 
probes for readers of methyl- lysine [78, 79]. Initial results of 
ligand discovery via experimental and virtual screening [43, 
44, 80] have confirmed some of the anticipated challenges in 
terms of potency for these weakly interacting domains, but 
also show promise for selectivity even within related MBT 
domain containing proteins (Table 1). By taking a broad and 
unbiased approach to probe discovery, the chances for find- 
ing potency enhancing features in ligands is increased via 
testing of each ligand hypothesis versus a large number of 
functionally homologous, but structurally distinct binding 
sites (as reviewed above). In addition, this approach natu- 
rally annotates ligand selectivity and creates structure activ- 
ity relationships across methyl-lysine reader families. High 
quality probes for methyl-lysine binders will have utility in 
exploring the biology of this large and diverse family and 
will lay the foundation for drug discovery as validated and 
tractable targets are revealed. 
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